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MEASURES  OF  PERSON  FIT  ON  ACHIEVEMENT  TESTS 

by 
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Chair:    Linda  M.  Crocker 

Major  Department:    Foundations  of  Education 

The  purpose  of  this  study  was  to  investigate  the  relationship 
between  an  examinee's  level  of  test  anxiety  and  each  of  five  different 
person-fit  statistics  and  to  establish  if  this  relationship  is 
dependent  on  ability  level.    A  secondary  interest  was  to  determine  the 
relationship  among  person-fit  indices  within  and  across  different 
subject  areas  of  a  standardized  achievement  test  and  to  assess  the 
internal  consistency  of  person-fit  indices. 

An  existing  data  set  was  analyzed  to  explore  the  nature  of  these 
relationships  for  the  modified  caution  index,  the  personal  biserial 
correlation,  the  norm  conformity  index,  the  Rasch  person-fit  index, 
and  the  extended  caution  index.    Achievement  test  scores  on  the 
reading,  mathematics,  and  science  subtests  of  the  Metropolitan 
Achievement  Test  (MAT)  and  scores  on  the  Test  Anxiety  Scale  for 
Adolescents  were  used  as  estimates  of  ability  and  anxiety.    The  item 
scores  and  total  test  scores  of  225  seventh-graders  and  188  eighth- 
graders  of  a  metropolitan  middle  school  comprised  this  data  set. 
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Intercorrel ations  among  the  measures  of  person-fit  were  in 
the  .80s  to  .90s  within  same-subject  content  areas.    Across  subject 
content  areas  little  or  no  relationship  was  found.    Low  to  moderate 
correlations  were  obtained  between  person-fit  indices  and  their 
corresponding  ability  scores  ( | .00  to  .50 1)  and  test  anxiety  ( J .02 
to  . 2 2 1 ) .    For  four  of  the  measures  of  person  fit,  on  one  or  more  of  the 
subject  tests,  a  significant  proportion  of  variance  was  explained  by 
the  linear  combination  of  ability,  test  anxiety,  and  their 
interaction.    In  these  cases  ability  levels  moderate  the  relationship 
between  person-fit  measures  and  test  anxiety;  for  lower  ability 
examinees  the  relationship  is  direct,  but  for  higher-ability  examinees 
the  relationship  is  inverse.    Only  the  Rasch  person-fit  index  was 
consistently  unaffected  by  this  interaction.    Corrected  split-half 
reliability  estimates  of  person-fit  indices  were  low  (.20  to  .56), 
indicating  little  consistency  of  the  trait. 

According  to  these  results,  the  potential  uses  of  person-fit 
indices  are  questionable  at  this  time.    More  research  is  needed  before 
these  measures  can  be  recommended  for  routine  use  in  interpretation  of 
achievement  test  scores  for  individual  examinees. 
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CHAPTER  I 
INTRODUCTION 


Statement  of  the  Problem 

Although  total  scores  have  been  consistently  used  as  the  basis  to 
evaluate  educational  achievement,  analysis  of  item-response  patterns 
can  contribute  additional  information  that  may  be  useful  in  the 
interpretation  of  an  overall  score.    Analysis  of  response  patterns  can 
be  based  on  two  dimensions:    item  difficulty  and  examinee  ability. 
Ability  is  typically  estimated  by  the  total  score  on  the  test  of 
interest,  and  item  difficulty,  by  the  proportion  of  examinees 
answering  the  item  correctly.    If  the  items  are  arranged  in  ascending 
order  of  difficulty,  an  examinee  with  a  given  ability  should  answer 
items  correctly  until  the  point  where  his  or  her  ability  matches  the 
difficulty  of  the  items,  and  miss  each  item  thereafter.  Deviations 
from  the  expected  response  pattern  occur  when  the  pattern  of  passed 
and  missed  items  is  not  consistent.    If  a  person  misses  easier  items 
but  then  responds  correctly  to  harder  items,  there  is  deviation  from 
the  expected  response  and  misfit  occurs. 

With  the  introduction  of  the  scalogram  technique,  Guttman  (1941, 
1950)  was  one  of  the  first  social  scientists  to  suggest  that  some 
persons  respond  consistently  to  a  given  set  of  ordered  stimuli  (test 
items)  while  others  do  not.    Under  Guttman's  scale  theory,  a  response 
pattern  where  a  student  passing  a  more  difficult  item  also  responds 
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correctly  to  all  easier  items,  is  called  a  perfect  simplex,  and  the 
scale  or  test  under  such  situation  is  called  a  perfect  scale. 

During  the  late  1970s  and  early  1980s  there  has  been  a  resurgence 
of  interest  in  using  information  provided  by  response  patterns.  A 
number  of  person-fit  statistics  have  been  developed  to  provide  a 
measure  of  an  individual  examinee's  deviation  from  the  expected 
response  pattern  to  a  given  set  of  items.    Although  some  studies  have 
shown  that  indices  of  person-fit  are  highly  correlated  (Harnisch  & 
Linn,  1981;  Rudner,  1983),  attempts  to  identify  causes  of  person 
misfit  (or  even  personality  or  demographic  correlates  of  it)  have 
remained  mainly  speculative.    Some  researchers,  such  as  Frary  (1982) 
and  Harnisch  and  Linn  (1981),  have  suggested  that  one  factor  which  may 
contribute  to  person-misfit  on  cognitive  tests  is  test  anxiety,  but 
prior  to  this  study  there  has  been  no  empirical  investigation  to  test 
this  hypothesis. 

■ 

Purpose  of  the  Study 

The  present  exploratory  study  was  designed  to  investigate  the 
nature  of  the  relationship  between  measures  of  person-fit  and  test 
anxiety.    For  each  of  five  selected  indices  of  fit  (modified  caution 
index,  personal  biserial  correlation,  norm  conformity  index,  Rasch 
person-fit  index,  and  an  extended  caution  index),  the  following 
questions  were  asked: 

1.    What  is  the  degree  of  linear  relationship  between  test 
anxiety  and  an  examinee's  level  of  misfit? 
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2.  What  is  the  degree  of  linear  relationship  between  ability  (as 
defined  by  performance  on  the  current  achievement  test)  and  level  of 
misfit? 

3.  To  what  extent  is  variance  in  person-misfit  explained  by  a 
linear  combination  of  the  variables:    ability  level,  test  anxiety, 
and  their  interaction? 

A  secondary  interest  in  this  investigation  was  to  explore  the 
degree  of  relationship  among  the  five  selected  person-fit  indices 
within  and  across  subject  area  tests  of  an  achievement  battery  and  to 
estimate  their  internal  consistencies.    This  information  was 
considered  important  because  the  tests  used  in  this  case  were  subtests 
from  a  well-known  nationally  norm-referenced,  standardized  achievement 
test  battery.    Earlier  studies  of  the  interrelationship  and 
reliability  of  person-fit  indices  have  typically  been  based  upon  state 
minimal  competency  examinations  (Harnisch  &  Linn,  1981)  or  locally 
developed  teacher-made  tests  (Frary,  1982). 

Theoretical  Rationale 

To  date  most  research  on  test  anxiety  has  considered  primarily 
the  effects  on  examinees'  total  test  score.    Recently,  Frary  (1982) 
and  Harnisch  and  Linn  (1981)  have  suggested  that  test  anxiety  may  be  a 
factor  which  contributes  to  erratic  performance  of  an  examinee  within  • 
a  given  test  (e.g.,  missing  relatively  easy  items,  while  answering 
more  difficult  items  correctly).    Careless  errors  and  lack  of 
concentration  by  high-test-anxious  individuals  could  change  the 
pattern  of  item  responses  from  the  pattern  that  would  be  expected. 
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Two  theories  predict  the  effect  of  anxiety  on  performance. 
According  to  the  cognitive  attentional  theory  of  test  anxiety,  highly 
anxious  students  attend  to  self-relevant  variables  instead  of  to  task- 
relevant  variables,  negatively  affecting  their  performance  (Wine, 
1980).    In  an  analysis  of  Spiel berger's  (1966,  1971)  extension  of 
Spence-Tay lor  drive  theory,  Heinrich  and  Spielberger  (1982)  make 
several  predictions  about  the  effect  of  ability  and  anxiety  on 
performance  of  tasks  with  varying  levels  of  difficulty  that  seem 
relevant  to  this  study  of  test  anxiety  and  person-fit.    These  are  as 
fol lows : 

1.  For  subjects  with  superior  intelligence,  high  anxiety 
will  facilitate  performance  on  most  learning  tasks.  While 
high  anxiety  may  initially  cause  performance  decrements  on 
very  difficult  tasks,  it  will  eventually  facilitate  the 
performance  of  bright  subjects  as  they  progress  through 
the  task  and  correct  responses  become  dominant. 

2.  For  subjects  of  average  intelligence,  high  anxiety 
will  facilitate  performance  on  simple  tasks  and,  later 
in  learning,  on  tasks  of  moderate  difficulty.    On  very 
difficult  tasks,  high  anxiety  will  generally  lead  to 
performance  decrements. 

3.  For  low  intelligence  subjects,  high  anxiety  may  faci- 
litate performance  on  simple  tasks  that  have  been  mastered. 
However,  performance  decrements  will  generally  be 
associated  with  high  anxiety  on  difficult  tasks,  especi- 
ally in  the  early  stages  of  learning.    (Heinrich  &  Spielberger, 
1982,  p.  147) 

According  to  these  predictions,  response  patterns  and  person-fit 
statistics  will  be  different  for  high,  average,  and  low  ability 
examinees  depending  on  their  anxiety  levels.    The  predicted  effect  for 
high  and  low  anxious  students  at  these  three  ability  levels  would  be 

1.    Subjects  with  high  ability  and  high  test  anxiety  would  be 
expected  to  initially  fail  hard  items,  but  since  during  testing 
conditions,  examinees  receive  no  feedback,  correct  responses  will  not 
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be  expected  to  become  dominant.    These  examinees  would  continue  to 
have  occasional  difficulty  on  harder  items,  but  their  high  levels  of 
test  anxiety  would  facilitate  performance  on  easier  items.  Moderate 
to  low  misfit  would  be  expected.    For  subjects  with  high  ability  and 
low  test  anxiety,  interference  in  performance  of  difficult  items  is  not 
expected.    Due  to  less  attentional  interference,  use  of  test-taking 
strategies  might  also  be  more  accessible.    These  students  might  be 
more  open  to  guessing  on  harder  items.    Moderate-high  misfit  could  be 
expected. 

2.  For  subjects  with  average  ability,  high  anxiety  will  help 
with  easy  to  moderately  difficult  items,  but  will  interfere  with 
harder  items.    A  low  to  moderate  misfit  would  be  predicted. 
Similarly,  low  test  anxiety  is  not  expected  to  differentially  affect 
item  responses  for  average-ability  examinees. 

3.  For  low  ability  subjects,  high  anxiety  may  help  with  the 
easier  items  but  will  interfere  with  performance  on  more  difficult 
items.    If  these  examinees  do  not  feel  very  confident  in  their 
knowledge,  high  anxiety  might  not  help  but  affect  their  concentration, 
making  them  answer  in  a  more  spurious  manner.    In  this  last  case, 
higher  misfit  might  occur.    When  low  ability  subjects  are  also  low  in 
test  anxiety  no  interference  is  expected;  these  students  will  probably 
direct  their  attention  to  the  easier  items  they  master.    A  low  to 
moderate  misfit  is  expected. 

Definition  of  Technical  Terms 

Definitions  and  formulas  required  to  explain  major  technical 
terms  used  in  this  study  are  as  follows: 
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Student-Problem  Table  (S-P  table) 

The  S-P  table  is  used  to  organize  test  information  into  a  matrix 
of  zeros  and  ones.    The  rows  in  this  matrix  represent  the  students 
ranked  from  highest  to  lowest  according  to  total  test  score.  The 
columns  represent  the  items  arranged  from  left  to  right  in  ascending 
order  of  difficulty.    Correct  responses  are  represented  by  ones  and 
incorrect  responses  are  represented  by  zeros.    Assuming  that  items  are 
arranged  in  increasing  order  of  difficulty  (from  easy  to  hard),  a 
concordant  response  pattern  is  one  in  which  an  examinee  answers  the 
items  correctly  until  he  or  she  reaches  an  item  that  is  too  difficult 
and  answers  the  items  incorrectly  from  then  on.    If  all  examinees  had 
concordant  response  patterns,  the  S-P  matrix  would  have  all  ones  in 
the  upper  left-hand  corner,  and  all  zeros  in  the  lower  right-hand 
corner.    A  short  illustrative  table  of  the  ideal  response  pattern  is 
presented  as  Table  1. 


Table  1 

S-P  Table  for  Five  Examinees  and  Six  Items  (Ideal  Pattern) 


Examinee 
i 

1  2 

3 

Item 
j 

4 

5 

6 

1 

1  1 

1 

1 

1 

0 

2 

1  1 

1 

1 

0 

0 

3 

1  1 

1 

0 

0 

0 

4 

1  1 

0 

3 

0 

0 

5 

1  0 

0 

0 

0 

0 

Modified  Caution  Index  (MCI) 

Harnisch  and  Linn  (1981)  introduced  the  modified  caution  index  as 
a  modification  of  an  earlier  caution  index  proposed  by  Sato  in  1975 
(cited  in  Harnisch  &  Linn,  1981).    The  MCI  has  a  lower  bound  of  0  and 
an  upper  bound  of  1.    The  higher  the  value  of  the  index,  the  more 
divergent  is  the  person's  response  pattern.    This  index  is  computed 
with  data  arranged  into  an  S-P  table,  using  the  following  formula: 

ni.  J 
I    (1  "  u-i>  i  "       I         uiin  i 

MCI  =  :   (!) 

ni.  J 
X   n  .  -      I  n  . 

j=l    -J     j  =  (J  +  1-n.J  -J 

where  i  is  the  examinee  index  in  the  S-P  matrix,  j  is  the  item  index, 
U-jj  is  1  if  examinee  i  answers  item  j  correctly  and  0  if  examinee  i 
answers  item  j  incorrectly,  n^  is  the  total  number  of  correct 
responses  for  examinee  i,  and  nj  is  the  number  of  correct  responses  to 
item  j. 

Personal  Biserial  Correlation  (PB) 

Donlon  and  Fisher  (1968)  proposed  this  correlation.  The 
coefficient  obtained  represents  the  correlation  between  a  person's 
item  response  and  the  item's  difficulty  value.    Donlon  and  Fischer 
define  item  difficulty  as  the  proportion  of  examinees  who  respond 
incorrectly  to  an  item.    Large  values  correspond  to  difficult  items 
and  small  values  correspond  to  easy  items.    A  positive  correlation 


represents  good  fit,  indicating  that  a  person  tends  to  answer 
correctly  items  that  are  easy  for  the  group  and  miss  the  more 
difficult  items.    Low  or  negative  correlations  represent  more 
divergent  response  patterns.    The  formula  to  compute  this  correlation 
is 

(Qr  -  Qc)     J  ' 

PB=— ^  -.4"  (2) 

\  Y 

where  Qr  is  the  mean  item  difficulty  for  items  answered,  Qc  is  the  mean 

item  difficulty  for  items  answered  correctly,  Sq    is  the  standard 

r 

deviation  for  Qr,  Jr'  is  the  number  of  items  answered  correctly 
divided  by  the  number  of  items  answered,  and  Y  is  the  ordinate  from 

the  standard  normal  curve  at  the  point  separating  the  proportions  Jr' 
and  (1  -  Jr'). 

Norm  Conformity  Index  (NCI) 

This  index  was  developed  by  Tatsuoka  and  Tatsuoka  (1980).  The 
NCI  indicates  the  degree  of  concordance  to  a  group  response  pattern 
where  items  are  arranged  in  descending  order  of  difficulty,  from 
hardest  to  easiest.    Values  of  this  index  may  range  from  -1  to  1.  The 
smaller  or  more  negative  the  index,  the  more  divergent  is  the 
individual's  response  pattern  in  comparison  to  the  group  norm.  This 
index  is  undefined  for  either  perfect  or  zero  scores.    Let  S  denote 
the  row  vector  of  a  person's  response  pattern;  letT  denote  the 
transpose  of  the  complement  of  S;  and  let  N  =1'S.    The  formula  to 
compute  this  index  is 
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NCI  =  (-^)  -  1  (3) 


where  Ua  =  £    [   n-jj,  the  sum  of  the  above  diagonal  elements  of  N  and 
i  J<i 

U  =  1  1  nii>  the  sum  °^  a"' ^  the  elements  of  N. 
i  J 

2 

Rasch  Person-Fit  Statistic  (Rx  ) 

The  Rx2,  also  referred  to  as  Maximum  Likelihood  Procedure  (MAX) 
(Wright  &  Panchapakesan,  1969)  and  as  weighted  total  fit  mean  square 
(Rudner,  1983),  was  adopted  for  use  with  the  one  parameter  Rasch  model 
and  is  calculated  using  the  BICAL  program  (Wright,  Mead,  &  Bell, 
1979).    This  index  has  a  high  value  for  an  examinee  who  has  a  response 
pattern  that  is  inconsistent  with  the  examinee's  score  and  the  Rasch 
model  measure  of  item  difficulty.    The  following  formula  is  used  to 
compute  this  index: 


2  1  (uu  -  V2 

Rx^  =    (4) 

(p1j(l  -  pij» 


where  U^j  is  the  response  of  examinee  i  to  item  j  and  P-jj  is  the 
probability  of  a  correct  response  for  examinee  i  on  item  j  as 
predicted  by  the  Rasch  model: 

(erb.)  (e.-b.) 
Pij  =  e    1    J  /[l  +  e    1    J  ] 
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Extended  Caution  Index  (ECI) 

The  extended  indices  have  been  proposed  by  Tatsuoka  and  Linn 
(1981,  1983).    They  describe  the  extended  indices  as  linear 
transformations  of  the  distance  between  a  person's  response  pattern 
and  a  theoretical  curve.    In  the  case  of  ECI,  this  curve  is  the  group 
response  curve  (GRC),  which  is  "an  average  function  of  N  different 
Person  Response  Curves"  (Tatsuoka  &  Linn,  1981,  p.  10).    For  the  ECI, 
probabilities  of  success,  calculated  through  item  response  theory- 
logistic  models  substitute  the  zeros  or  ones  in  the  S-P  table.  For 
purpose  of  this  study,  the  Rasch  one-parameter  logistic  model  will  be 
used  to  calculate  these  probabilities.    The  formula  for  the  ECI  is 

,1  (»u  -  pi.'<Y.j  -  p..> 

ECI  =  1  -    (5) 

where  Y ^ j  is  the  response  of  examinee  i  to  item  j,  Y j  is  the  sum 
of  responses  across  examinees  for  item  j,        is  the  proportion  of 
correct  responses  of  examinee  i,  P    is  the  total  proportion  of 
correct  responses,  P-jj  is  the  probability  of  correct  response  for  each 
examinee  i  on  item  j  according  to  the  Rasch  model,  and  Zp^ j / J  is  the 
mean  predicted  probability  of  success  for  examinee  i. 

This  formula  is  computed  by  the  ratio  of  two  covariances.  The 
higher  the  value  of  the  ECI,  the  more  variation  from  the  expected 
response  pattern.    This  index  is  also  limited  by  perfect  or  zero 
scores.    The  denominator  would  become  zero  and  the  value 
infinite. 
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Assumptions 

The  following  underlying  assumptions  were  held  for  this  study: 

1.  Standardized  testing  situations  are  capable  of  inducing  test 
anxiety  among  students  who  would  normally  be  test  anxious. 

2.  Students  responded  truthfully  on  the  self-report  instrument 
used  to  assess  their  level  of  test  anxiety. 

3.  Total  score  on  the  achievement  test  used  can  be  taken  as  an 
estimate  of  the  examinee's  ability  (substituting  for  any  external 
measure  of  ability,  such  as  an  I.Q.  or  academic  aptitude  test  score). 

4.  Each  subtest  of  the  achievement  test  measures  a  fairly 
unidimensional  trait  (i.e.,  achievement  in  reading  or  mathematics  or 
science).    This  assumption  is  critical  for  the  indices  using  the  Rasch 
statistics,  but  it  also  underlies  the  assumptions  made  to  calculate 
the  other  indices. 

Educational  Significance 

The  potential  value  of  person-fit  indices  has  been  cited  by  Frary 
(1982),  Harnisch  (1983),  Harnisch  and  Linn  (1981),  Levine  and  Rubin 
(1979),  Rudner  (1983),  and  Van  der  Flier  (1982).    These  writers 
suggest  that  these  indices  could  be  useful  for  the  following  purposes: 

1.  To  identify  individuals  for  whom  the  test  is  inappropriate  or 
invalid.    Total  test  score  interpretation  can  be  misleading  for 
examinees  who  come  from  different  experiential  backgrounds  or  take  the 
test  under  different  motivational  dispositions,  e.g.,  test  anxiety. 

2.  To  identify  groups  with  different  instructional  practices  or 
histories,  which  could  change  the  difficulty  of  the  items,  e.g., 
schooling  differences. 


-12- 


3.    To  identify  items  that  are  inadequate  for  particular  groups 
of  examinees. 

Presently  person-fit  indices  are  considered  to  be  at  a  state  of 
development  where  more  research  is  needed  to  investigate  their 
psychometric  properties  and  establish  their  applicability.  The 
reasons  why  some  people  are  misfits  are  not  clear.    If  test  anxiety 
can  be  identified  as  a  factor  associated  with  person  misfit,  then  the 
interpretive  value  of  person-fit  statistics  would  be  enhanced. 
Another  pragmatic  contribution  of  this  study  is  to  extend  the  body  of 
research  of  person-fit  statistics  by  providing  information  about 
1)  the  agreement  of  person  fit  classifications  across  different 
subject  matter  content  areas,  as  measured  by  the  subtests  of  the 
Metropolitan  Achievement  Test,  and  2)  the  degree  of  agreement  of 
person-fit  classifications  by  different  indices. 

Summary 

Analysis  of  item-response  patterns  provides  information  not 
contained  in  a  total  test  score.    Although  the  idea  of  using  response 
pattern  information  probably  originated  when  Guttman  (1941)  introduced 
the  scalogram  technique,  it  has  not  been  until  the  late  1970s  and 
early  1980s  that  a  strong  interest  in  person-fit  statistics  has 
developed. 

Person-fit  statistics  quantify  the  degree  of  deviation  of  an 
examinee's  response  pattern  from  the  expected  response  pattern.  The 
development  and  application  of  person-fit  indices  is  at  a  fledgling 
stage.    More  research  is  needed  to  investigate  their  psychometric 
properties  and  establish  their  applicability.    Attempts  to  identify 
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causes  of  person-misfit  have  remained  mainly  speculative.  Recently 
Frary  (1982)  and  Harnisch  and  Linn  (1981)  have  suggested  that  test 
anxiety  may  be  one  factor  that  can  explain  erratic  performance  of  an 
examinee  within  a  given  test  (e.g.,  missing  easy  items  while  answering 
more  difficult  items  correctly).    According  to  drive  theory  the 
relationship  between  test  anxiety  and  performance  is  moderated  by 
level  of  ability  and  task  difficulty.    Performance  on  specific  items 
might  not  only  be  dependent  on  the  item's  difficulty  and  the 
examinee's  ability  but  also  on  the  examinee's  test  anxiety. 

The  primary  purpose  of  this  study  was  to  establish  1)  the  degree 
of  linear  relationship  between  test  anxiety  and  an  examinee's  level  of 
misfit,  2)  the  degree  of  linear  relationship  between  ability  (total 
score  on  achievement  test)  and  level  of  misfit,  and  3)  the  extent  that 
variance  in  person-misfit  can  be  explained  by  a  linear  combination  of 
ability  level,  test  anxiety,  and  their  interaction.    Five  different 
indices  of  person-fit  were  used  in  this  study:    the  MCI,  PB,  NCI,  Rx^, 
and  ECI. 

A  secondary  interest  was  to  investigate  the  relationship  among 
the  five  selected  person-fit  indices  within  and  across  subtests  of  a 
norm-referenced  achievement  battery  and  to  estimate  their  internal 
consistencies. 


CHAPTER  II 
REVIEW  OF  LITERATURE 


The  two  central  aspects  of  this  study  are  person-fit  statistics 
and  test  anxiety.  These  two  topics  provide  the  major  themes  for  the 
organization  of  the  literature  review  presented  in  this  chapter. 


Person-Fit  Measures 


Historical  Background 


During  the  late  1970s  and  early  1980s  there  has  been  an 
increasing  interest  in  the  development  and  application  of  statistical 
indices  to  identify  examinees  with  aberrant  item-response  patterns. 
Proponents  of  person-fit  statistics  indicate  that  these  indices  add  to 
the  information  provided  by  total  scores  and  can  also  be  used  to 
identify  potentially  inaccurate  total  scores  (Frary,  1982;  Harnisch, 
1983;  Harnisch  &  Linn,  1981;  Rudner,  1983). 

This  trend  toward  using  information  from  item-response  patterns 

is  not  new.    According  to  Gaier  and  Lee  (1953) 

one  of  the  most  promising  trends  in  current  psychometric 
research  is  an  increasing  concern  with  methods  of 
evaluating  patterns  of  test  scores  and  test  responses 
.  .  .  our  initial  hypothesis  is  that  consideration  of 
response  configurations  will  yield  more  fruitful  results 
than  the  usual  methods  of  reporting  merely  the  total 
score  for  a  test  ...  a  total  score  may  thus  carry 
considerably  less  diagnostic  significance  than  a  direct 
and  detailed  analysis  of  test  responses  per  se. 
(p.  140) 
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Guttman  (1941,  1950)  was  one  of  the  first  writers  to  suggest  that 

some  persons  respond  consistently  to  a  given  set  of  ordered  stimuli 

(test  items)  while  others  do  not.    According  to  Guttman  (1950),  "a 

person  who  endorses  a  more  extreme  statement  .  .  .  should  endorse  all 

less  extreme  statements  if  the  statements  are  to  be  considered  a 

scale"  (p.  62).    Guttman's  description  of  the  basic  procedure  for  the 

scalogram  technique  of  scale  analysis  is  very  similar  to  Sato's  S-P 

chart  construction.    He  states  that  there  are  two  basic  steps  in  the 

scalogram  pattern  formation.    These  are 

first,  the  questions  are  ranked  in  order  of  "difficulty" 
with  the  "hardest"  questions,  i.e.,  the  ones  that  fewest 
persons  got  right,  placed  first  and  with  the  other 
questions  following  in  decreasing  order  of  "difficulty." 
Second,  the  people  are  ranked  in  order  of  "knowledge" 
with  the  "most  informed"  persons,  i.e.,  those  who  got 
all  questions  right,  placed  first,  the  other  individuals 
following  in  decreasing  order  of  "knowledge."  (Guttman, 
1950,  p.  70) 

Sato's  S-P  chart  is  also  a  two-dimensional  matrix  where  the  rows 
represent  the  students  ranked  from  highest  to  lowest  according  to 
total  test  score  (cited  in  Harnisch  &  Linn,  1981).    The  columns 
represent  the  items  arranged  from  left  to  right  in  ascending  order  of 
difficulty.    Construction  of  a  scalogram  pattern  and  a  S-P  table 
follow  the  same  two  steps.    Once  the  responses  are  organized  in  this 
fashion,  a  concordant  response  pattern  is  defined  as  the  case  when  an 
examinee  answers  the  items  correctly  until  he  or  she  reaches  an  item 
that  is  too  difficult  and  answers  al 1  items  incorrectly  thereafter. 
Some  disruption  of  a  perfect  pattern  can  happen.    As  the  response 
pattern  deviates  more  from  the  expected  pattern,  the  degree  of 
aberrance  increases.    Visual  identification  of  aberrant  or  erratic 
response  patterns  becomes  increasingly  more  difficult  as  the  number  of 
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i  terns  in  a  test  increases.    The  number  of  possible  response  patterns 
multiplies  as  the  number  of  items  increases.   With  the  recent 
introduction  of  a  variety  of  statistics  to  measure  the  degree  of 
deviation  from  a  typical  response  pattern,  there  is  a  renewed  interest 
in  using  response  pattern  information. 

Types  of  Person-Fit  Indices 

Indices  measuring  the  degree  of  unusual  response  patterns  can  be 
categorized  into  three  major  types:    norm-comparison  indices, 
goodness-of-f it  indices,  and  extended  indices. 

Norm-comparison  indices,  which  are  based  on  observed  patterns  of 
right  and  wrong  answers  and  are  calculated  with  summary  statistics 
based  on  the  norm  group,  include  Sato's  caution  index  (cited  in 
Harnisch  &  Linn,  1981),  the  modified  caution  index  (Harnisch  &  Linn, 
1981),  the  agreement,  disagreement,  and  dependability  indices  proposed 
by  Kane  and  Brennan  (1980),  the  U'  index  by  Van  der  Flier  (1977),  the 
personal  biserial  by  Donlon  and  Fischer  (1968),  and  the  norm- 
conformity  index  by  Tatsuoka  and  Tatsuoka  (1980).    Van  der  Flier's  U' 
index  and  Tatsuoka  and  Tatsuoka's  norm-conformity  index  have  been 
reported  to  have  a  perfect  negative  relationship  (Harnisch  &  Linn, 
1981).    Norm-comparison  indices  are  calculated  by  using  information 
organized  in  a  S-P  table.    They  indicate  the  degree  of  aberrance  from 
the  expected  response  pattern,  when  examinee  ability  is  defined  as  the 
total  observed  score  on  the  test. 

Goodness-of-fit  or  "appropriateness"  indices  are  based  on  item 
response  theory  (IRT)  (Levine  &  Rubin,  1979).    As  with  norm-comparison 
indices,  goodness-of-fit  indices  are  also  based  on  the  expected 
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response  pattern  for  an  examinee  at  a  given  ability  level.  The 
distinction  is  that  for  goodness-of-f it  indices  a  more  sophisticated 
definition  of  "ability"  is  employed.    Instead  of  simply  equating 
ability  with  the  examinee's  observed  raw  score  on  the  test,  ability  is 
defined  in  terms  of  his  estimated  score  on  a  theoretical  latent 
continuum  underlying  test  performance.    There  are  two  popular  IRT 
models  that  estimate  examinee  abilities  based  on  the  latent  trait 
underlying  test  performance.    For  Rasch's  one-parameter  logistic 
model,  examinee  ability  estimates  are  determined  as  a  function  of  item 
difficulty  parameters.    A  widely  used  computer  program,  the  BICAL, 
written  by  Wright  et  al .  (1979)  provides  examinee  ability  estimates, 
item  difficulty  parameter  estimates,  and  a  person-fit  statistic  (Rx  ) 
which  "indicates  how  well  the  individual's  item  response  pattern  and 
the  Rasch  model  fit"  (Rudner,  1982,  p.  4). 

The  second  widely  used  IRT  model  is  Birnbaum's  three  parameter 
logistic  model  (Lord  &  Novick,  1968,  Ch.  17)  for  which  examinee 
ability  estimates  are  determined  as  a  function  of  item  difficulty, 
item  discrimination,  and  guessing  parameters.    Levine  and  Rubin  (1979) 
developed  three  types  of  appropriateness  indices,  based  on  Birnbaum's 
three  parameter  logistic  model.    These  approaches  are  the  marginal 
probability,  the  likelihood  ratios,  and  the  estimated  ability 
variation  indices.    A  practical  limitation  in  using  these  procedures 
arises  from  the  large  sample  sizes  usually  recommended  to  obtain 
stable  estimates  from  the  three-parameter  model  (Hambleton  &  Cook, 
1977).    Levine  and  Rubin  devised  a  simulation  of  item  response  data  on 
the  Scholastic  Aptitude  Test  (SAT)  to  conform  to  normal  or  aberrant 
response  patterns.    Their  findings  indicate  that  all  three  types  of 


goodness-of-f it  indices  demonstrate  the  capability  to  detect  aberrance 
when  present. 

Extended  caution  indices  have  been  proposed  by  Tatsuoka  and  Linn 
(1981,  1983)  as  a  link  between  the  norm-comparison  and  the  goodness- 
of-fit  indices.    They  linked  Sato's  S-P  theory  and  item  response 
theory  by  replacing  the  original  observed  zeros  and  ones  of  the  item 
scores  with  IRT  probabilities  of  passing  the  items.  These 
probabilities  were  then  used  in  the  calculation  of  the  caution 
indices.    Five  variations  of  the  extended  caution  index  were  created. 
These  extended  caution  indices  are  defined  as  "linear  transformations 
of  the  covariance  or  correlation  between  a  person's  response  pattern 
and  a  theoretical  curve"  (Tatsuoka  &  Linn,  1983,  p.  95).  Their 
findings  support  the  effectiveness  of  the  extended  indices  in 
identifying  examinees  who  use  erroneous  rules  in  answering  arithmetic 
test  problems.    Tatsuoka  and  Linn  (1983)  point  out  that  these  indices 
can  have  instrumental  utility  by  identifying  students  who  consistently 
make  errors  because  of  misconceptions. 

Comparative  Studies  of  Person-Fit  Indices 

There  have  been  several  studies  in  which  the  relationship  and 
effectiveness  of  person-fit  indices  have  been  compared.    Harnisch  and 
Linn  (1981)  made  a  comparative  analysis  of  ten  norm-comparison 
indices.    Using  mathematics  and  reading  tests  from  a  statewide 
assessment  program  they  also  examined  school  and  regional  differences. 
The  intercorrel ations  between  these  indices  ranged  from  |  .13  to  .99  | 
for  mathematics  and  from  |  .34  to  .96  |  for  reading.    They  found  that 
Kane  and  Brennan's  (1980)  agreement  index  had  the  lowest  correlation 
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with  the  other  indices,  but  had  the  highest  correlation  (.99)  with 
total  score.    The  modified  caution  index  (MCI)  was  found  to  have  the 
lowest  correlation  with  total  score  (-.02  for  mathematics  and  -.21  for 
reading).    Harnisch  and  Linn  (1981)  found  significant  school  and 
regional  differences  in  student's  response  patterns  as  measured  by  the 
MCI. 

Rudner  (1982,  1983)  evaluated  nine  indices;  four  were  norm- 
comparison  indices,  while  five  were  goodness-of-f it  indices.  He 
generated  data  by  simulating  examinees  and  their  responses  through 
Birnbaum's  three-parameters  model.    Response  patterns  were  altered  to 
simulate  spuriously  high  or  low  respondents.    Findings  indicated  that 
the  norm-comparison  indices  (point  biserial  correlation,  PB,  NCI,  and 
MCI)  and  the  weighted  total  fit  mean  square  or  Rx2  were  highly 
intercorrel ated  (1.77  to  ,99|  ).    The  goodness-of-f it  indices  using 
Birnbaum's  three  parameter  model  and  the  unweighted  total  fit  mean 
square  had  lower  intercorrel ations  (1.17  to  .80 1 ).    Validity  of  the 
indices  was  tested  by  observing  how  sensitive  they  were  to  assessment 
accuracy.    The  MCI  and  the  NCI  identified  comparable  proportions  of 
examinees  with  aberrant  response  patterns.    According  to  Rudner, 
"these  two  approaches  were  the  most  stable  of  the  statistics"  (Rudner, 
1983,  p.  217).    In  general,  indices  based  on  IRT  showed  better 
detection  rates  of  aberrant  response  patterns  than  the  norm-comparison 
indices. 

Frary  (1982),  using  teacher-made  multiple-choice  tests,  compared 
three  person-fit  measures;  the  Rx2,  the  MCI,  and  a  weighted  choice 
index.    In  the  weighted  choice  index,  distractor  choice  is  considered 
as  part  of  the  estimation  of  person-fit.    The  Rx2  and  the  MCI  were 
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found  to  be  highly  correlated  (.75).    The  smallest  relationship 
between  any  two  of  the  three  indices  was  between  the  Rx^  and  the 
weighted  choice  test  (.42).    In  this  study  Frary  was  the  first  to 
compute  and  report  person-fit  internal  consistency  estimates.    Low  and 
even  negative  split-half  coefficients  of  the  person-fit  measures  in 
his  study  were  found  (Frary,  1982). 

Person-Fit  Indices  Under  Study 

The  present  study  is  the  first  to  include  all  three  different 
types  of  person-fit  statistics  (i.e.,  the  norm-comparison  indices,  the 
goodness-of-f it  indices,  and  the  extended  indices).    The  five  indices 
under  study  are  the  modified  caution  index  (MCI),  the  personal 
biserial  correlation  (PB),  the  norm-conformity  index  (NCI),  the  Rasch 
person-fit  statistic  (Rx^) ,  and  the  extended  caution  index  (ECI). 

The  MCI  was  chosen  for  use  in  this  study  because  it  was  found  to 
be  least  related  to  total  test  scores  (Harnisch  &  Linn,  1981)  and  is 
considered  stable  with  short  and  long  tests  (Rudner,  1983).    The  PB 
was  selected  because  it  has  been  in  use  for  a  longer  period  of  time 
than  more  recent  indices  and  is  generally  associated  with  classical 
test  theory.    Its  computations  are  relatively  simple  and  it  has  been 
found  to  be  very  efficient  with  shorter  classroom  tests  (Rudner, 
1983).    For  these  reasons  the  PB  could  be  useful  to  a  larger  number  of 
practitioners.    The  NCI  has  been  found  to  correlate  with  total  score 
somewhat  higher  than  other  indices  (Harnisch  &  Linn,  1981),  but  it  has 
nevertheless  been  recurrently  used  in  different  research  studies 
(Harnisch  &  Linn,  1981;  Rudner,  1982,  1983;  Van  der  Flier,  1977, 
1982).    The  NCI  and  the  MCI  are  considered  to  be  the  most  applicable 
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and  stable  under  situations  with  long  and  short  tests  and  spuriously 
high  or  low  scores  (Rudner,  1983). 

The  Rx2  index  was  selected  as  part  of  the  indices  used  in  this 
study,  due  to  the  availability  of  the  BICAL  computer  program.  The 
convenience  of  having  the  Rx2  computations  given  as  part  of  the  output 
from  the  BICAL  computer  program  make  the  Rx2  index  more  usable  to 
practitioners.    The  Rx2  is  a  goodness-of-f it  or  appropriateness  type 
of  person-fit  index.    It  uses  the  Rasch  one-parameter  logistic  model 
to  estimate  ability  and  item  difficulty.    Appropriateness  indices 
requiring  use  of  the  three-parameter  logistic  model  were  not  feasible 
for  the  present  study  because  of  the  larger  sample  size  recommended  to 
get  consistent  ability  parameter  estimates  (Hambleton  &  Cook,  1977). 
The  ECI  represents  a  link  between  norm-comparison  indices  and 
goodness-of-f it  indices.    Since  no  comparisons  of  the  ECI  with  other 
indices  or  computations  with  actual  data  are  available  in  the 
literature  this  index  was  included  to  evaluate  its  relationship  to  the 
other  indices. 

It  is  probably  noteworthy  that  most  previous  studies  of  multiple 
measures  of  person-fit  have  focused  primarily  upon  intercorrel ations 
among  these  indices  without  investigating  how  they  correlate  with 
measures  of  any  trait  other  than  achievement  itself,  as  measured  by 
the  test.    The  present  study  is  somewhat  broader  in  scope,  since  it 
investigates  how  these  indices  relate  to  another  variable,  test  anxiety. 

Test  Anxiety 

Most  research  on  test  anxiety  has  considered  the  effects  of  test 
anxiety  on  total  score.    According  to  Tryqn  (1980)  test  anxiety 
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research  findings  present  a  consistent  moderate  negative  correlation 
between  test  anxiety  and  total  score  measures  of  achievement.  High- 
test-anxious  individuals  tend  to  score  lower  on  classroom  and  aptitude 
tests  (Alpert  &  Haber,  1960;  Harper,  1974;  Mandler  &  Sarason,  1952;  I. 
Sarason,  1963,  1975;  Spielberger,  Gonzalez,  Taylor,  Algaze,  &  Anton, 
1978). 

Several  researchers  have  tried  to  explain  why  test  anxiety 
affects  performance.    According  to  the  cognitive  attentional  theory  of 
test  anxiety  (CATTA),  introduced  by  Sarason  (1960)  and  extended  by 
Wine  (1971,  1980),  the  "major  cognitive  characteristics  of  test 
anxious  persons  are  negative  self-preoccupation,  and  attention  to 
evaluative  cues  to  the  detriment  of  test  cues"  (Wine,  1980,  p.  371). 
This  misdirection  of  attention,  both  in  the  pre-stages  of  evaluation 
(study  phase)  and  the  test-taking  situation,  may  limit  coding, 
retention,  and  retrieval  of  information  by  high-test-anxious 
individuals.    Difficulty  of  the  task  (e.g.,  difficult  items)  is 
expected  to  negatively  affect  attention.    Thus  according  to  CATTA, 
performance  of  test-anxious  persons  will  be  negatively  affected. 

The  Spence-Tay lor  drive  theory  also  predicts  the  effect  of 
anxiety  on  performance  of  tasks  with  varying  levels  of  difficulty. 
Heinrich  and  Spielberger  (1982)  summarize  these  predictions  according 
to  the  difficulty  of  the  task.    They  explain  that  for  high  anxious 
students  the  performance  of  a  task  is  dependent  on  its  difficulty. 
High  anxiety  may  facilitate  performance  on  easy  tasks,  interfere  with 
performance  on  harder  tasks,  and  be  dependent  on  the  stage  of  learning 
for  tasks  of  intermediate  difficulty.    Heinrich  and  Spielberger  (1982) 
explain  the  relationship  between  performance  and  the  learning  stage. 
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According  to  these  authors,  "high  anxiety  will  be  detrimental  to 
performance  early  in  learning  when  the  strength  of  correct  responses 
is  weak  relative  to  competing  error  tendencies.    Later  in  learning, 
high  anxiety  will  begin  to  facilitate  performance  as  correct  responses 
are  strengthened  and  error  tendencies  are  extinguished"  (Heinrich  & 
Spielberger,  1982,  p.  146). 

Varying  ability  levels  and  their  relationship  with  anxiety  and 
task  difficulty  are  also  considered  by  the  Spence-Tay  1  or  drive  theory. 
According  to  Spielberger  (1971)  the  effect  of  anxiety  on  subjects  with 
different  ability  levels  will  be  subject  to  the  task  difficulty  and 
the  learning  stage  considered. 

These  two  theories,  the  CATTA  and  the  Spence-Tay lor  drive  theory, 
point  to  the  possibility  that  test  anxiety  might  have  an  effect  at  the 
item  level  and  that  this  effect  might  be  dependent  on  ability  level. 
Person-fit  statistics  measure  the  deviation  from  an  expected  response 
pattern.    According  to  the  theory  of  person-fit,  in  a  good  fit  to  a 
response  pattern  "high  ability  examinees  are  expected  to  get  few  easy 
items  wrong,"  while  "low  ability  examinees  are  expected  to  get  few 
difficult  items  right"  (Rudner,  1983,  p.  207).    If  test  anxiety 
affects  performance  at  the  item  level,  it  might  be  a  factor  which 
contributes  to  erratic  performance  for  high  anxiety  examinees. 

Summary 

Literature  pertinent  to  person-fit  indices  and  test  anxiety  has 
been  reviewed  in  this  chapter.    Recent  literature  on  person-fit 
measures  shows  an  increasing  interest  in  the  development  and 
application  of  these  indices.    The  idea  of  using  information  provided 
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by  response  patterns  is  not  new;  it  was  first  introduced  by  Guttman 
(1941)  with  the  scalogram  technique.    During  the  late  1970s  and  early 
1980s  a  number  of  different  person-fit  statistics  were  developed  as 
measures  of  the  degree  of  deviation  from  a  typical  response  pattern. 
These  indices  can  be  categorized  into  three  major  types:  norm- 
comparison  indices,  goodness-of-f it  indices,  and  extended  indices. 

Five  person-fit  indices  were  selected  for  use  in  this  study. 
Three  of  these  indices  are  norm  comparison  indices  (MCI,  PB,  and  NCI), 
one  is  a  goodness-of-f  it  index  (Rx^),  and  one  belongs  to  the  extended 
category  of  indices  (ECI).    Three  major  research  studies  comparing 
person-fit  indices  were  reviewed.    Harnisch  and  Linn  (1981)  compared 
ten  norm-comparison  indices,  and  concluded  that  the  MCI  seemed  to  be 
the  most  promising  due  to  its  lower  correlation  to  total  score. 
Rudner  (1983)  evaluated  four  norm-comparison  indices  and  five 
goodness-of-f it  indices.    He  found  that  goodness-of-f it  indices  showed 
better  detection  rates  of  aberrant  response  patterns.    Frary  (1982) 
contributed  to  the  development  of  person-fit  indices  by  being  the 
first  to  study  their  internal  consistency.    He  found  low  split-half 
reliabilities. 

Test  anxiety  has  been  suggested  as  a  factor  that  could  affect 
performance  within  a  test  (Frary,  1982;  Harnisch  &  Linn,  1981).  Two 
theories,  the  CATTA  and  the  Spence-Tay 1  or  drive  theory  predict  that 
high  test  anxiety  will  negatively  affect  performance.    These  two 
theories  point  to  the  possibility  that  test  anxiety  might  have  an 
effect  at  the  item  level  and  that  this  effect  might  be  dependent  on 
ability  level.    Test  anxiety  could  thus  be  a  factor  that  contributes 
to  person-misfit. 


CHAPTER  III 
METHODOLOGY 

The  present  study  was  designed  to  investigate  the  relationship 
between  an  examinee's  level  of  test  anxiety  and  each  of  five  different 
person-fit  statistics  and  to  establish  if  this  relationship  is 
dependent  on  ability  level.    A  second  purpose  was  to  investigate  the 
correlation  of  person-fit  indices  within  and  across  different  subject 
areas  of  a  standardized  achievement  test  battery  and  to  assess  the 
internal  consistency  of  person-fit  indices.    An  existing  data  set  was 
•analyzed  to  explore  the  nature  of  these  relationships.    A  description 
of  the  examinee  group,  instruments,  data-file  creation,  and  data 
analysis  methods  is  presented  in  this  chapter. 

Examinees 

The  data  pool  used  in  this  study  consisted  of  test  scores  and 
item  responses  from  225  seventh-graders  and  188  eighth-graders  from  a 
metropolitan  middle  school  in  north  central  Florida.    There  was  an 
almost  even  distribution  of  boys  and  girls  at  each  grade  level. 
Approximately  70%  of  the  examinees  were  white  and  30%  were  black  at 
each  grade  level.    The  school  population  is  heterogeneous  with  respect 
to  socio-economic  level. 
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Instruments 

The  Test  Anxiety  Scale  for  Adolescents  (TASA)  (Schmitt  and 
Crocker,  1982)  was  used  to  measure  test  anxiety  as  a  trait.  This 
instrument  is  a  modified  version  of  the  37-item  Mandl er-Sarason  Test 
Anxiety  Scale  (Sarason,  1972).    This  scale  consists  of  31  true-false 
items  and  is  designed  for  use  with  examinees  in  middle  school  or 
junior-high  grades.    Unlike  most  other  test  anxiety  scales  for 
children,  al  1  items  on  the  TASA  deal  exclusively  with  examinee 
feelings  about  tests.    Sample  items  include 

"I  worry  just  before  getting  a  test  back"; 

and 

"Sometimes  on  a  test  I  just  can't  think." 

Schmitt  and  Crocker  (1982)  have  found  the  factor  structure  of  the 
TASA  to  be  fairly  similar  to  that  reported  for  the  adult  test  anxiety 

scales.    They  reported  a  KR2n.  of  -87  as  a  total  score  reliability 
estimate  for  these  middle  school  examinees.. 

The  Metropolitan  Achievement  Test  (MAT)  subscales  (Form  KS)  were 
used  to  measure  achievement  in  reading,  mathematics,  and  science 
(Prescott,  Balow,  Hogan,  &  Farr,  1978).    The  TASA  was  administered  in 
March,  1981,  approximately  two  weeks  prior  to  the  school -wide 
administration  of  standardized  achievement  tests.    The  MAT  was 
administered  by  school  staff  as  part  of  the  school  district's  regular 
testing  program  approximately  two  weeks  after  the  test  anxiety  scale 
was  given.    The  range  of  item  difficulties  of  the  MAT  subscales  for 
the  seventh  and  eighth  grades  is  respectively:    .21  to  .99  and  .28 
to  .99  for  reading;  .20  to  .97  and  .31  to  .97  for  mathematics; 
and  .29  to  .94  and  .34  to  .98  for  science. 


Creation  of  the  Data  File 


The  test  anxiety  scores,  coded  with  student  ID  number,  but  no 
other  identifying  information,  were  obtained  in  conjunction  with  a 
University  of  Florida  College  of  Education  inservice  training  project 
for  school  personnel  on  identifying  and  counseling  test-anxious 
students.    The  researcher  later  obtained  a  set  of  MAT  test  item 
responses  for  students  with  those  same  ID  numbers  from  the  county 
school  district  testing  office.    This  data  file  also  contained  some 
demographic  information  (i.e.,  sex,  race,  and  grade  level).    The  data 
file  used  for  analysis  in  this  study  was  created  by  matching  the  two 
examinee  data  files  on  student  ID  number  and  merging  the  files. 
Thirty-seven  of  the  students'  records  (13  of  the  seventh  graders  and 
24  of  the  eighth  graders)  in  the  merged  file  were  deleted  later, 
because  it  was  found  that  these  students  had  been  tested  out-of-grade 
level,  rather  than  on  the  test  form  taken  by  their  grade  peers. 

Calculation  of  Person-Fit  Statistics 

For  each  examinee  on  each  subtest,  five  different  indices  of 
person-fit  were  calculated.    These  were  the  modified  caution  index, 
the  personal  biserial  correlation,  the  norm-conformity  index,  the 
Rasch  person-fit  index,  and  the  extended  caution  index.    To  create  the 
data  file  containing  the  five  person-fit  indices  for  each  MAT  subtest 
at  each  grade  level,  the  original  item-examinee  response  matrix  was 
used.    Each  examinee's  response*  to  each  item  was  coded  0  or  1  in  this 
matrix.    The  data  on  this  matrix  were  used  to 

*Blanks  or  omitted  responses  were  treated  as  incorrect  responses  and 
assigned  a  0  value. 
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1.  compute  total  scores  to  get  an  ability  estimate  for  each 
student; 

2.  compute  a  mean  score  for  each  item  as  estimate  of  item 
difficulty  (p  value);  and 

3.  reorganize  data  into  an  S-P  matrix,  by  sorting  by  total 
score,  and  by  item  difficulty. 

The  resulting  matrix  had  students  organized  by  ability  (from  most 
able  to  least  able)  and  items  organized  by  difficulty  (from  easiest  to 
hardest).    This  S-P  matrix  was  used  to  calculate  the  five  person-fit 
indices,  using  computer  programs  written  by  this  author  for  each 
person-fit  statistic.    Refer  to  Chapter  I  for  formulas'  definition. 
These  statistics  were  programmed  using  the  Statistical  Analysis  System 
(SAS)  package  (Helwig  &  Council,  1979).    The  accuracy  of  each 
programmed  computation  was  tested  using  the  dummy  data  set  given  by 
Harnisch  and  Linn  (1981,  p.  136). 

Analysis 

Means,  standard  deviations,  minimum  and  maximum  values  by  grade 
were  computed  for  each  person-fit  statistic,  ability  measure,  and  test 
anxiety  score.    For  the  person-fit  indices  and  ability  measures  these 
descriptive  statistics  were  calculated  for  each  subtest  of  the  MAT 
(reading,  mathematics,  and  science).    Correlations  among  fit 
statistics  and  between  person-fit  and  test  anxiety  and  between  person- 
fit  and  ability  measures  were  calculated. 

To  investigate  the  relationship  between  examinees'  level  of  test 
anxiety  and  degree  of  person-fit  and  to  study  if  this  relationship  is 
dependent  upon  ability  level,  a  linear  multiple  regression  analysis 
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was  used.    In  this  analysis  person-fit  measures  were  the  dependent 
variables  and  ability  (reading,  mathematics,  or  science)  and  TASA  were 
the  continuous  independent  variables.    The  model  used  for  each  ability 
measure  and  person-fit  index  is 

r  =  b0  +  bi*i  +  b2X2  +  b3X1X2 

where  Y'  =  person-fit  predicted  by  the  model,  b0  =  intercept  value,  b\ 
=  regression  slope  for  the  ability  independent  variable,      =  ability, 
estimated  from  total  score,  b2  =  regression  slope  for  the  TASA 
independent  variable,  X2  =  TASA,  b3  =  regression  slope  for  the 
interaction  of  ability  and  TASA,  and  X^2  =  interaction  between 
ability  and  TASA. 

To  estimate  the  internal  consistency  of  person-fit  indices,  items 
for  each  MAT  subtest  at  each  grade  level  were  divided  into  odd  and 
even  subtests.    The  original  sequential -test-item-number  was  used  for 
this  split.    Odd-item  and  even-item  person-fit  statistics  were 
computed  by  following  the  sequence  of  steps  previously  described.  The 
fit  index  for  the  odd  items  was  correlated  with  the  fit  index  for  the 
even  items  and  the  resulting  correlation  was  corrected  using  the 
Spearman-Brown  formula  to  obtain  an  internal  consistency  estimate  for 
the  full-length  test. 

Summary 

A  linear  multiple  regression  analysis  was  used  to  investigate 
the  relationship  between  an  examinee's  level  of  test  anxiety  and  each 
of  five  different  person-fit  statistics  and  to  study  if  this 
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relationship  is  dependent  on  ability  level.   The  five  person-fit 
indices  included  in  this  study  were  the  modified  caution  index,  the 
personal  biserial  correlation,  the  norm  conformity  index,  the  Rasch 
person-fit  index,  and  the  extended  caution  index.    A  data  set  of  225 
seventh-grade  and  188  eighth-grade  examinees'  responses  to  the  Test 
Anxiety  Scale  for  Adolescents  and  to  the  Metropolitan  Achievement  Test 
reading,  mathematics,  and  science  subscales  was  used  to  compute 
person-fit  statistics  and  explore  the  nature  of  these  relationships. 
Correlations  of  person-fit  statistics  between  and  within  the  different 
subject  area  tests  were  examined.    Person-fit  split-half  reliabilities 
were  also  computed  for  each  index  for  each  content  area  and  grade 
level . 


CHAPTER  IV 
RESULTS 

The  present  study  was  undertaken  to  answer  the  following 
questions  for  each  of  five  selected  indices  of  person-fit: 

1.  What  is  the  degree  of  linear  relationship  between  test 
anxiety  and  an  examinee's  level  of  misfit? 

2.  What  is  the  degree  of  linear  relationship  between  ability  (as 
defined  by  performance  on  the  current  test)  and  level  of  misfit? 

3.  To  what  extent  is  variance  in  person  misfit  explained  by  a 
linear  combination  of  the  variables:    ability  level,  test  anxiety, 
and  their  interaction?    Analyses  were  also  performed  to  explore  the 
degree  of  relationship  among  the  five  selected  person-fit  indices  and 
to  determine  their  split-half  reliabilities. 

The  results  of  the  analyses  presented  in  this  chapter  have  been 
organized  beginning  with  results  of  simpler  analyses  and  proceeding  to 
those  of  greater  complexity.    The  issue  of  the  degree  of  relationship 
among  the  five  person-fit  indices  within  and  across  subtests  of  the 
MAT  is  addressed  in  the  next  section,  Descriptive  Statistics.  Data 
relevant  to  questions  1  and  2  (dealing  with  the  bivariate 
relationships  between  test  anxiety  and  misfit  and  between  ability 
level  and  misfit)  are  presented  next.    Results  of  multiple  regression 
analyses,  relevant  to  question  3  are  presented  in  the  third  section  of 
this  chapter.    The  final  section  of  this  chapter  contains  the  results 
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of  the  investigation  of  the  internal  consistency  of  person-fit 
indices . 

Descriptive  Statistics 

The  mean,  standard  deviation,  and  minimum  and  maximum  values  for 
person-fit  measures  by  grade  and  ability  subtests  are  presented  in 
Table  2.    Means  and  standard  deviations  for  each  person-fit  index  are 
very  similar  across  grades  and  ability  subject  tests.    For  the  reading 
test,  person-fit  measures  show  the  greatest  dispersion  between  the 
minimum  and  maximum  scores.    For  this  subtest  of  the  MAT,  the  maximum 
score  observed  for  the  personal  biserial  correlation  on  both  the 
seventh  and  eighth  grade  was  greater  than  one.    This  was  not  a 
computation  error  but  may  be  ascribed  to  sampling  error  or  violation 
of  the  assumption  of  an  underlying  normal  distribution  for  the 
dichotomous  item  response  variable.    Lord  and  Novick  discuss 
conditions  under  which  biserial  correlations  may  exceed  1.00  (Lord  & 
Novick,  1968,  p.  339).    Descriptive  statistics  for  ability  tests  and 
test  anxiety  for  the  seventh  and  eighth  grades  are  presented  in 
Table  3. 

Person-fit  measures'  intercorrel ations  for  each  grade  level  are 
presented  in  Table  4.    Seventh-grade  intercorrel ations  are  shown  above 
the  diagonal  and  eighth-grade's  below  the  diagonal.    Results  of  these 
correlations  indicate  that  person-fit  measures  are  highly  related 
within  the  same  subject  test  area.    These  correlations  ranged  . 
from  |.78  to  .99 [  and  are  all  significant  at  an  alpha  level  of  .0001. 
However,  across  tests  in  different  subject  areas,  there  was  little  or 
no  relationship  between  the  same  person-fit  measure.    The  Rasch 
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Table  2 

Descriptive  Statistics  for  Person-Fit  Measures  by  Grade  and 
Ability  Test 
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person-fit  measure  (Rx2)  had  the  lowest  relationship  to  other  indices 
within  same  subject  test.    Interestingly  Rx2  is  the  only  index  with 
significant  intercorrelations  with  other  indices  across  subject  areas. 

Relationship  Between  Person-Fit,  Ability,  and 
Test  Anxiety 

Correlations  between  person-fit  measures,  total  ability  measures 
of  reading,  mathematics,  and  science  and  test  anxiety  for  each  grade 
are  presented  in  Table  5.    Correlations  between  person-fit  measures 
and  their  corresponding  total  ability  scores  ranged  from  | .00  to  .50 1. 
In  general  the  personal  biserial  (PB)  index  had  the  lowest  correlation 
with  total  score.    This  lower  correlation  was  consistently  observed 
across  subject  test  areas  and  grade  level.    Correlation  values  between 
person-fit  measures  and  test  anxiety  ranged  from  | .02  to  .22 1 :  The 
highest  correlation  between  person-fit  measures  and  test  anxiety 
occurred  for  seventh  grade  science  person-fit  values  and  for  eighth- 
grade  reading  person-fit  values. 

Relationship  Between  Person-Fit  and  a  Linear  Combination 
of  Ability,  Test  Anxiety,  and  Their  Interaction 

Multiple  regression  results  for  the  model  with  ability,  test 
anxiety,  and  their  interaction  are  summarized  in  Table  6.    Values  of 
R*-  for  the  model  at  each  test  area  are  reported.    Models  with 
significant  ability  and  test  anxiety  interactions  are  identified  by 
having  their  corresponding  R2  underlined.    For  the  seventh-grade 
sample,  a  significant  proportion  of  variance  in  the  modified  caution 
index  was  explained  for  each  of  the  three  subject  area  tests  by  the 
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Table  5 

Correlations  Between  Person-Fit  Measures  and  Total  Scores 
on  Corresponding  Ability  Test  and  Test  Anxiety 


Grade 


Person-Fit         Total  TASA  Total  TASA 


Reading  Test 

MCI 

.23** 

- .  18** 

.32** 

.07 

PS 

-.05 

.10 

.18* 

-.15* 

NCI 

.•03 

.03 

.20* 

-.15* 

Rx2 

-.11 

-.02 

.16* 

.17* 

ECI 

.02 

-.08 

.05 

.14 

Mathematics  Test 

MCI 

-.15* 

.03 

.09 

.06 

PB 

.04 

.04 

.01 

-.03 

NCI 

.20** 

-.05 

.17* 

-.08 

Rx2 

-.32** 

.10  - 

.30** 

.11 

ECI 

-.23** 

.05 

.20** 

.08 

Science  Test 

MCI 

-.25** 

.17* 

.45** 

.03 

PB 

.08 

-.09 

.29** 

.02 

NCI 

.22** 

-.16* 

.44** 

-.03 

Rx2 

-.39** 

.  22** 

.50** 

.08 

ECI 

-.28** 

.  18** 

.50** 

.04 

Note: 

Higher 

nisfit  is 

represented  by  higher  values 

on  the  MCI, 

Rx2, 

and  ECI 

and  by  lower  values  cn  the  PB  and  NCI 

*Si gn if icant  at  a  =  .05. 


Significant  ata  =  .01 . 
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Table  6 

2 

Percentage  of  Variance  (R  )  in  Person-Fit  Explained  by 
the  Combination  of  Examinee  Ability,  Test  Anxiety,  and 
Their  Interaction 


Test 


Person-Fit 


MCI 


PB 


MCI 


Rx' 


ECI 


Reading 

Mathematics 

Science 


.11** 
.04* 
.07** 


Seventh  Grade 

.04* 
.03 
.01 


.01 

.05* 

.06** 


.02 

.11** 

.16** 


.03 

.06** 

.09** 


Reading 

Mathematics 

Science 


.12** 

.04 

.23** 


Eighth  Grade 

.05* 
.04 
.13** 


.05 

.06* 

721** 


.05* 

.10** 

.27** 


.02 

.07** 

7Z5*-» 


'Significant  at  a  ■  .05. 
"Significant  at  a  =  .01. 
 Significant  interaction  a  =  .05. 


linear*  combination  of  test  anxiety,  ability,  and  their  interaction. 
Only  in  reading,  however,  was  the  percentage  of  variance  explained 
greater  than  10%,  and  on  this  case  the  interaction  of  ability  and  test 
anxiety  was  significant.    For  the  personal  biserial,  the  norm- 
conformity  index  and  the  extended  caution  index,  although  several 
significant  R2  values  were  observed,  these  were  all  less  than  .10. 
Thus  it  is  difficult  to  interpret  these  relationships  as  being 
substantially  important.    For  the  Rasch  index  of  person-fit, 
substantial  proportions  of  variation  were  explained  by  the  model  in 
the  areas  of  math  and  science.    No  significant  interaction  effect  of 
ability  and  test  anxiety  was  detected  in  either  of  these  cases. 

For  the  eighth  grade,  the  modified  caution  index  again  appears  to 
be  sensitive  to  variation  in  examinees'  level  of  test  anxiety, 
ability,  and  their  interaction.    The  significant  R2  values  exceeded 
10%  for  this  index  in  reading  and  science.    Science  had  the  largest  R2 
(.23)  and  a  significant  interaction  between  test  anxiety  and  ability. 
For  the  personal  biserial,  the  norm-conformity  index,  and  the  extended 
caution  index,  significant  R2  values  greater  than  .10  were  found  only 
in  the  area  of  science,  and  for  each  of  these  cases,  the  interaction 
between  ability  and  test  anxiety  contributed  significantly  to  the 
overall  model.    Significant  (and  meaningful)  R2  values  for  the  Rasch 
xc  index  were  found  for  the  areas  of  science  and  mathematics  without 
any  interaction  between  ability  and  test  anxiety. 


*Curvi linear  relationships  between  person-fit  indices  and  ability  and 
TASA  measures  were  tested  and  found  not  significant. 


For  further  interpretations  of  these  results,  the  nature  of  the 
interaction  effect  of  test  anxiety  and  ability  on  person-fit  was 
examined. 

Significant  interactions  were  followed  up  by  plotting  the 
relationship  between  person-fit  measures  and  TASA  at  selected  ability 
levels.    The  formula  to  calculate  the  regression  line  at  each  ability 
level  is 

Y'  =  b0  +  bxXx  +  (b2  +  b3X1)X2 

At  any  value  of  ability  (X]_)  a  predicted  person-fit  measure 
(V)  was  calculated  for  different  points  of  TASA  (X2).    The  intercept 
of  this  model  equals  (bg  +  b^X^)  and  the  slope  equals  (b2  +  b3Xj_). 
Table  7  reports  the  slope  and  intercept  estimates  which  were  used  in 
plotting  these  lines  for  all  cases  in  which      exceeded  .10. 

The  regression  lines  resulting  from  these  computations  are  shown 
in  Figures  1-5.    It  should  be  noted  that  Figures  2-5  are  based  on  a 
single  grade-level  and  a  single  subject  area.    These  plots  depict  the 
nature  of  the  interaction  of  ability  and  test  anxiety  on  various 
indices  of  person  fit.    Although  the  points  of  intersection  of  these 
lines  may  vary,  the  same  general  pattern  of  relationship  between 
person-fit  and  test  anxiety  occurs  in  all  cases.    Namely,  this  pattern 
of  interaction  can  be  characterized  as  follows: 

1.  For  examinees  in  the  average  ability  ranges,  there  is  little 
or  no  relationship  between  test  anxiety  and  person-misfit.  (Note  the 
"flat"  slope  of  the  regression  line  for  Group  E  in  all  figures.) 

2.  As  examinee  ability  level  increases  (see  lines  for  Groups  F, 
G,  and  H),  the  slope  of  the  regression  lines  increases,  generally 
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Table  7 

2 

Significant  Ability  and  Test  Anxiety  Interactions  and  R  Increases 
for  Person-Fig  Measures 


Defendant 
/ariaDi  e 

Parameter 

Estimate 

Error 

t 

R 

Increase 

venth  Grade- 

-Reaaing 

Mr* 

.  0543 

.46 

Slope-Abi 1 ity 

.0058 

.0016 

3.74*» 

Sloce-TASA 

.0071 

.0035 

2.02* 

Slope-Ability*TASA 

-.0002 

.0001 

-2.52* 

.026 

(Model  -  R2  = 

.11** 

''  R3,220  =  3 

.30**) 

Eighth  Grade-- 

Science 

Mr ' 

Intercept 

.3126 

.0593 

5.22** 

Siope-Abi 1 ity 

-.0013 

.0015 

-.35 

Slope-TASA 

.0064 

.0037 

1.73 

Slope-Abiiity*TASA 

-.0002 

.0001 

-2.15* 

.020 

(Model  -  R2  = 

.  23** 

F3,182  *  li 

3.00**) 

PB 

.ntercspt 

.5323 

.1199 

4.44** 

Slope-Abil ity 

-.0020 

.0031 

-  .53 

Slope-TASA 

-.0157 

.0075 

-2.22* 

Slooe-Abil ity'TASA 

.0005 

.0002 

2.66** 

.024 

• 

(Model  -  R2  = 

.  13**  ■ 

3 .  i32 

06**) 

nc: 

Intercept 

.3727 

.1210 

3. OS*' 

Slope-Abil ity 

.0021 

.0031 

.55 

Slooe-TASA 

-.0144 

.0075 

-1.91 

Slooe-Abil ity*TASA 

.0004 

.0002 

2.33* 

.323 

(Model  -  R  ■  . 

23**; 

'"3,132  =  17 

.33**) 

■CI 


Intercept 
Slcpe-Abil ity 
Slope- 7ASA 
31 :3e-Abil ity'TASA 

-3 

[Mocel  -  R~  = 


.2291 
-.0053 

.0327 
- . O^1 1 

'3,132 


.2573 
.  CC6  7 

.0004 


mere 
misfit 

.38r 
.36 
.34 
.32 
.30 
.28 
.26 
.24 
.22 
.20 
.18 
.16 
.14 
.12 
.10 


less  f 
misfit  L/A 


Reading  Ability: 

A  -  I  2.35 
B  -  18.  16 
C  -23.97 
D  -29.78 
E-35.59  (x) 
F-4  1  .40 

G-47.21 
H-53.02 


1.73     5.06  8.39 


1.72  15.05 
TASA 


18.38   21.71    25.04  28.37 


Figure  1.  Relationship  between  the  modified  caution  index  and  test 
anxiety  for  seventh-grade  examinees  at  different  reading 
abil ity  level s. 
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more 
misfit 

.40 
.38 
.36 
.34 
.32 


CO 
i 

03 
U 

c 

03 

'o 
CO 

o 


.30 

.28 

.26 

.24 

.22 

.20- 

.18- 

.16- 

.  14  - 


Science  Ability: 

A  -  16.76 
B  -21.33 
C  -25.89 
D  -30.46 
E  -35.02  (x) 
F  -39.59 
G  -44.15 
H  -48.72 


misfit 


1.29     4.41     7.52    10.64   13.75  16.87   19.98    23.10  26.21 

TASA 


Figure  2.  Relationship  between  the  modified  caution  index  and  test 
anxiety  for  eighth-grade  examinees  at  difference  science 
abil ity  levels. 
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less 
misfit 

.72 
.68 
.64 
.60 
.56 

CO  -52 
l 

<D  .48 
o 
c 

.44 

o 

£  .40 
CD 

Q_  .36 
.32 
.28 
.24 


Science  Ability: 

A  -  16.76 
B  -21.33 
C  -25.89 
D  -30.46 
E  -35.02  (  x) 
F  -39.59 
G  -44.15 
H  -48.72 


more  i 
misfit 


.29    4.41      752  10.64   13.75   16.87    19.98  23.10   26  21 

TASA 


Figure  3.    Relationship  between  the  personal  biserial  and  test  anxiety 
for  eighth-grade  examinees  at  different  science  ability 
level s. 
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less 
misfit 
.77 

.73 

.69 

.65 

.6  I 

I 

<d    .  53 

u 

§  .49 

o 

^2  45 
O  .4| 

.37 

.33 

.29 

.25 


Science  Ability 
A  -  16.76 


more  \ 
misfit  >-// — >- 


1.29     4.41     7.52    10.64    13.75    16.87    19.98  2310  26  21 

TASA 


Figure  4.  Relationship  between  the  norm  conformity  index  and  test 
anxiety  for  eighth-grade  examinees  at  different  science 
abil ity  level s. 
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more 
misfit 

.56r 
.48 

.40h 

.32 

.24 

.16 

.08 

CO  oo 


CD 

u 
c 

CD 
"(J 

00 


O 


-.08 
-.  16 
-.24 
-.  32 
-.40 
-.48 
-.56 
-.64 
-.72 


-.80- 
less  / 
misfit 


Science  Ability: 


A 

-  16.76 

B 

-21.33 

C 

-25.89 

D 

-30.46 

E 

-35.02  (x) 

F 

-39.59 

G 

-44.15 

H 

-48.72 

1.29     4.41      752    10.64    13.75    16.87    19.98  23.10  26.2! 

TASA 

Figure  5.    Relationship  between  the  extended  caution  index  and  test 
anxiety  for  eighth-grade  examinees  at  different  science 
ability  levels. 
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indicating  an  increasing  negative  relationship  between  test  anxiety 
and  person-fit.    Specifically  high-ability,  low-anxious  examinees  show 
more  misfit  than  high-ability,  high-anxious  examinees. 

3.    As  examinee  ability  decreases  (see  lines  for  Groups  C,  B,  and 
A),  the  opposite  effect  occurs.    Namely  low-ability,  low-anxious 
examinees  show  less  misfit  than  low-ability,  high-anxious  examinees. 

When  no  interactions  were  significant,  only  ability  main  effects 
were  significant.    Table  8  presents  the  Type  I  sums-of-squares  which 
measure  incremental  sums  of  squares  for  the  model  as  each  variable  was 
added.    Models  for  the  Rasch  person-fit  index  on  mathematics  and 
science  ability  main  effects  for  the  seventh  and  eighth  grades  were 
significant.    There  was  also  a  significant  main  effect  of  reading 
ability  for  the  model  with  the  modified  caution  index  at  the  eighth 
grade. 

Internal  Consistency  of  Person-Fit  Statistics 

Corrected  split-half  internal  consistency  reliability 
coefficients  for  person-fit  measures  by  grade  and  subject  content  area 
are  reported  in  Table  9. 

For  the  seventh  grade  sample,  the  internal  consistency  estimates 
ranged  from  .23  to  .56.    The  highest  person-fit  split-half  reliability 
estimates  were  found  consistently  for  the  reading  subtest.    For  the 
eighth  grade  the  range  of  values  was  from  .23  to  .39,  with  a  slight 
trend  for  reading  to  have  the  higher  values. 

Among  person-fit  indices,  the  Rx^  index  has  the  highest  internal 
consistency  estimates  for  the  eighth  grade  (ranging  from  .31  to  .39), 
but  for  the  seventh  grade,  no  person-fit  index  was  consistently  more 
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Table  8 

Significant  Main  Effect  of  Ability  as  Predictor  of  Person-Fit 
Measures 


3epenaent  Type  I 

VariaDle           Source  df  Sum  Squares  Mean  Squares  F  9r 

Seventn  Grade— Mathematics 

3x2                  Mocel  3  .50  .17  3.20™  .11* 

Mathematics  1  .47  .47  25. 30™ 

TASA  1  .01  .01  .47 

Mathemat1cs*TASA  1  .02  .02  1.12 

:r  221  4.01  .02 


Seventh  Grade—Science 

Rx2                 Model                      3               .50  .17  13.73' 

Science                   1               .49  .49  39.35" 

"ASA                       1               .01  .01  1.21 

Science*TASA            1               .00  .00  .01 

Error  221  2.70  .01 


Eighth  Grade— Reading 

Model  3  .31  .  10  7.37™ 

heading  1  .27  .27  20. 3S* 

TASA  1  .04  .04  3.33 

3eading*7ASA  1  .00  .00  .17 

Error  130  2.36  .01 


Eighth  Grade-Mathematics 

.  2 

*x                   Model  3               .44  .15  7.01™  .10* 

Mathematics  1               .33  .33  13!o6™ 

TASA  1               .01  .01  *  \zi 

Mathematics 'TASA  1               .35  .05  2.73 

Error  134  3.36  .02 


Eighth  Grade— Science 


Model 
Science 
i  ASA 

Science*TASA 
irrcr 


.a/ 
.53 
.01 
.02 

3  1 


2' 
.01 


ii.  jo™ 

53.32™ 

.31 
3'07 


'Sigm  —  cant  at  i 


'Sign if-: cant  st  s  =  31. 
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Table  9 

Corrected  Split-Half  Reliability  Estimates  for 
Person-Fit  Measures  by  Grade  and  Ability  Test 


Person-? i t 

Test 

MCI 

PS 

NCI 

ECI 

Seventh  Grade 

Reading 

Mathematics 

Science 

.  56 
.28 
.25 

.  56 
.28 
,20 

Eighth  Grade 

.49 
.IS 
.22 

.45 
.11 

19 

.29 
.25 

Reading 

Mathematics 

Science 

.35 
.29 
.25 

.37 

.:; 

.23 

.31 
.33 
.26 

.23 
.39 
.31 

.33 
.36 
.25 

reliable  than  the  other.    Overall  with  the  exception  of  the  reading 
subtest  the  reliabilities  for  person-fit  indices  for  the  mathematics 
and  science  subtests  appear  consistent  across  various  indices,  and 
are  relatively  low. 

Summary 

Results  can  be  summarized  as  follows: 

1.  Descriptive  statistics  for  person-fit  measures,  test  anxiety, 
and  ability  scores  were  very  similar  across  subject  content  area  and 
grades. 

2.  Intercorrelations  between  person-fit  measures  showed  that 
these  measures  are  highly  related  within  the  same  subject  content 
area.    Across  subject  area  little  or  no  relationship  was  found. 

3.  Correlations  between  person-fit  measures  and  their 
corresponding  total  ability  score  ranged  from  1 .00  to  .50 1.    The  PB 
index  had  the  lowest  correlation  with  total  score. 

4.  Correlations  between  person-fit  measures  and  test  anxiety 
scores  ranged  from  |.02  to  .22  |.    In  science,  four  out  of  five  indices 
were  significantly  related  to  test  anxiety  scores  for  seventh  graders. 
For  eighth  graders  in  reading,  three  of  the  five  indices  were 
significantly  related  to  test  anxiety. 

5.  A  significant  proportion  of  variance  in  person-fit  measures 
was  explained  by  the  linear  combination  of  test  anxiety,  ability,  and 
their  interaction,  for  the  seventh  and  eighth  grade  reading  MCI  index 
and  for  the  eighth  grade  science  MCI,  PB,  NCI,  and  ECI.  Regression 
lines  depicting  the  nature  of  these  interactions  were  presented  for 
selected  ability  levels.    Significant  and  meaningful  R2  values 


(greater  than  R2  =  .10)  for  the  Rasch  person-fit  index  were  found  for 
the  areas  of  science  and  mathematics  without  any  interaction  between 
ability  and  test  anxiety. 

6.  Corrected  split-half  internal  consistency  estimates  for  the 
person-fit  indices  ranged  from  .20  to  .56. 


CHAPTER  V 
DISCUSSION 

This  study  was  conducted  to  investigate  the  relationship  between 
examinee's  level  of  person-fit  and  test  anxiety,  and  to  study  the 
effect  of  ability  on  this  relationship.    Five  person-fit  indices  were 
calculated  for  seventh  and  eighth  grade  students  who  had  taken  a  test- 
anxiety  self-report  test  and  the  reading,  mathematics,  and  science 
subtests  of  the  MAT. 

Discussion  of  results  will  focus  on  findings  about  (1)  the 
interrelationship  among  the  five  person-fit  indices,  (2)  the 
relationship  between  person-fit,  test  anxiety,  and  ability,  and 
(3)  the  reliability  of  the  five  person-fit  statistics  under  study. 

Relationships  Among  Person-Fit  Statistics 

Intercorrel ations  among  measures  of  person-fit  were  quite  high 
within  same-subject  content  areas.    The  correlation  values  ranged 
from  1.78  to  .991.    These  high  person-fit  statistics' 
intercorrel ations  confirm  previous  research  findings  by  Harnisch  and 
Linn  (1981)  and  Rudner  (1983).    In  particular,  Harnisch  and  Linn 
found  intercorrel ations  among  the  MCI,  PB,  and  NCI  that  ranged 
from  |.93  to  .97 1  for  mathematics  tests  and  from  1.89  to  .951  for 
reading  tests.    Rudner's  intercorrelations  among  the  MCI,  PB,  NCI,  and 
the  Rx2  for  the  simulated  SAT  test  ranged  from  1 .80  to  .99  |  and 
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from  | .77  to  .99  |  for  the  simulated  teacher-made-biology  test.  These 
consistent  high  intercorrelations  among  the  person-fit  indices  under 
study  indicate  that  they  seem  to  be  measuring  a  common  construct. 

Relatively  low  correlations  were  found  for  any  given  index  (MCI, 
PB,  NCI,  Rx2,  and  ECI)  across  the  reading,  mathematics,  and  science 
tests.    These  correlations  ranged  from  -.03  to  .24.    There  seems  to  be 
no  persistence  of  misfit  across  the  different  tests.    These  results 
would  question  the  notion  that  tendency  to  misfit  is  a  stable  trait 
which  consistently  manifests  itself  in  examinee  performance  across 
various  subject  areas.    Frary's  (1982)  correlations  of  same  person-fit 
index  across  several  tests  were  also  very  low. 

For  practical  implications  these  results  lead  to  two  points  that 
need  to  be  considered  when  interpreting  person-fit  results.  Namely, 
that  if  an  examinee  is  identified  as  having  poor  person-fit  by  one 
index,  another  index  will  probably  identify  him/her  as  a  misfit  on  the 
same  test;  but  that  it  can  not  be  concluded  that  he  or  she  will  also 
be  a  misfit  on  another  test. 

Relationship  Between  Person-Fit,  Ability,  and  Test  Anxiety 

Low  to  moderate  correlation  values  were  obtained  between  person- 
fit  indices  and  their  corresponding  total  ability  scores.    The  Rasch 
person-fit  index  was  the  index  with  the  highest  correlations  with 
total  mathematics  and  science  ability  scores.    For  the  reading  test, 
the  MCI  had  the  highest  correlations  with  total  score,  but  this 
correlation  was  positive  while  a  negative  correlation  would  have  been 
expected.    The  reading  subtest  was  not  typical  of  a  power  test,  since 
most  examinees  did  not  finish  all  items.    More  able  examinees  probably 
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attempted  more  items,  passing  end  items  that  were  more  "difficult" 
than  some  items  which  they  had  missed  earlier  in  the  test.  This 
caused  more  able  students  to  get  higher  misfit  classifications,  as  can 
be  seen  in  Figure  1.    The  only  person-fit  index  that  did  not  seem  to 
be  affected  by  the  speededness  of  the  reading  test  was  the  Rx^. 
Although  its  relationship  to  total  reading  score  was  low,  it  was  in 
the  direction  expected. 

The  PB  index  had  the  lowest  correlation  with  total  scores. 
Contrary  to  the  present  findings,  Harnisch  and  Linn  (1981)  found  that 
the  PB  was  one  of  the  indices  that  had  a  high  relationship  with  total 
score  on  the  reading  test  (.63).    The  relationship  of  the  PB  to  the 
math  total  score  on  the  Harnisch  and  Linn  study  was  nevertheless 
somewhat  lower  (.28)  than  for  some  other  indices. 

Correlation  values  between  person-fit  measures  and  test  anxiety 
were  relatively  low.    These  correlations  ranged  from  | .02  to  ,ZZ\. 
The  Rx^  index  had  the  highest  correlations  with  test  anxiety.  The 
only  case  when  the  Rx^  correlated  lowest  with  test  anxiety  was  for  the 
seventh-grade  reading  test.    This  low  correlation  is  ascribed  to  the 
speeded  nature  of  the  reading  test,  which  was  more  noticeable  at  the 
seventh  grade.   These  correlation  values  are  disappointingly  low  from 
a  practical  perspective;  however,  one  point  that  might  be  considered 
is  that  these  observed  correlations  may  have  been  attenuated  by  the 
extremely  low  reliability  of  the  person-fit  measures  and  by  having  a 
general  or  trait  measure  of  test  anxiety  instead  of  a  state-specific 
measure.    As  an  exploratory  exercise  one  could  speculate  about  the 
nature  of  this  relationship  if  person-fit  could  be  more  reliably 
measured.    A  correction  for  attenuation  method  (Magnusson,  1956, 
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p.  148)  was  used  to  estimate  what  the  correlation  between  person-fit  and 
test  anxiety  would  be  if  these  measures  were  perfectly  reliable.  The 
corrected  correlations  between  person-fit  measures  and  test  anxiety 
ranged  from  j .03  to  .44 1 .    Even  with  the  correction  for  attenuation 
most  of  these  correlations  are  still  relatively  low  and  there  is 
little  evidence  offered  by  the  present  study  to  support  the  notion 
that  higher  reliabilities  can  be  achieved  in  person-fit  measures. 

In  order  to  explore  the  extent  to  which  variance  in  person  misfit 
can  be  explained  by  a  linear  combination  of  the  variabl es--abi 1 ity 
level,  test  anxiety,  and  their  interaction— this  linear  multiple 
regression  model  was  tested  for  each  person-fit  statistic  at  each 
grade  level  and  for  each  ability  measure.    Because  of  the  large  sample 
size  a  number  of  fairly  small  R2  were  statistically  significant,  so 
only  cases  where  there  was  a  meaningful  percentage  of  variance  (larger 
than  10%)  were  considered. 

The  significant  interactions  between  ability  and  test  anxiety 
demonstrate  that  ability  levels  moderate  the  relationship  between 
person-fit  measures  and  test  anxiety.    For  lower-ability  examinees  a 
direct  relationship  between  person-fit  measures  and  test  anxiety  was 
found.    For  higher-ability  examinees  this  relationship  was  found  to  be 
inverse. 

Figures  1  and  2  present  the  two  general  pictures  of  these 
interactions.    Figure  1  shows  the  relationship  between  the  modified 
caution  index  and  test  anxiety  as  measured  by  TASA  for  seventh-grade 
examinees  with  different  reading  levels.    The  nature  of  the 
interaction  is  the  same  as  previously  described,  but  higher  ability 
level  students  have  higher  overall  MCI  values  (more  misfit).    For  all 
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the  other  significant  interactions  which  were  for  person  fit  measures 
of  science  at  the  eighth  grade,  lower  ability  level  students  have 
higher  values  of  misfit.    Figure  2  shows  this  for  the  MCI.  The 
difference  found  for  these  two  content  areas  might  be  due  to  the 
performance  of  this  sample  on  the  reading  subtest.    More  than  10 
percent  of  the  seventh  graders  missed  the  last  fifteen  items  of  the 
reading  test,  making  it  appear  more  like  a  speeded  test  than  a  power 
test.    Higher  ability  examinees  probably  attempted  more  end  items, 
increasing  their  probability  of  getting  higher-misfit  classifications. 

The  cogniti ve-attentional  theory  of  test  anxiety,  the  Spence- 
Taylor  drive  theory,  and  previous  person-fit  research  findings  suggest 
interpretive  explanations  of  the  interaction  results.    According  to 
Tobias  (1980)  and  Weinstein,  Cubberly,  and  Richardson  (1982)  high- 
test-anxious  students  will  have  worse  performance  on  difficult 
materials  compared  to  low-test-anxious  individuals,  while  with  easier 
material  little  difference  between  anxiety  levels  is  expected.  In 
reporting  results  about  cognitive  coping  behavior  and  anxiety,  Houston 
(1982)  suggests  that  "highly  trait-anxious  (and  test  anxious) 
individuals  tend  to  lack  organized  ways  of  coping  with  stress  and 
instead  ruminate  about  themselves  and  the  situation  in  which  they  find 
themselves"  (p.  198). 

Since  high-test-anxious  students'  performance  on  more  difficult 
items  is  more  affected  by  their  anxiety  as  these  examinees  reach  items 
that  have  a  level  of  difficulty  that  approximates  their  ability  level 
they  will  have  a  harder  time  coping.    Other  testing  strategies  such  as 
test  wiseness  would  not  be  readily  available  due  to  lack  of 
concentration.    Examinees  with  high  ability  levels  and  low  test 
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anxiety  could  take  advantage  of  test-taking  skills  and  answer  item 
correctly  beyond  their  ability  level.    Since  these  items  would  not  be 
answered  with  the  same  degree  of  certainty  as  easier  items,  more 
deviation  from  the  expected  response  pattern  could  occur  and  higher 
misfit  would  result.    For  examinees  with  high  ability  and  high-test 
anxiety,  coping  and  test-taking  skills  could  be  blocked;  making  them 
consistently  miss  harder  items.    Lower  misfit  values  would  occur  for 
this  group.    Examinees  with  lower  ability  and  lower  test  anxiety 
levels  would  answer  items  correctly  to  the  point  where  they  reach 
items  at  their  ability  level  and  then  miss  the  harder  items.  Some 
misfit  could  occur  due  to  attempts  at  harder  items.    For  examinees 
with  low  ability  and  high-test  anxiety,  distracting  thoughts  might 
interfere  with  performance  to  almost  all  items.    Even  easy  items  (in 
reference  to  their  ability  level)  could  be  missed;  this  sporadic 
answering  pattern  would  classify  this  group  as  high  misfits. 

These  interaction  effects  between  ability  and  test  anxiety  seem 
to  appear  more  consistently  for  science  especially  at  the  eighth 
grade.    One  possible  explanation  is  that  the  standardized  science  test 
fits  the  curriculum  less  than  the  reading  or  math  tests.    The  mean 
item  difficulty  for  the  science  test  is  lower,  indicating  a  harder 
test.    Examinees  taking  the  science  examination  might  find  themselves 
in  a  more  ambiguous  and  hence  anxiety-producing  situation. 

These  findings  are  primarily  of  theoretical  interest  to  those  who 
may  be  interested  in  learning  more  about  the  constructs  of  test 
anxiety  or  person-fit.    At  best  the  combination  of  ability,  test 
anxiety,  and  their  interaction  appear  to  account  for  only  about  one- 
fourth  of  the  variance  in  person-fit  indices,  and  the  increments  in  R2 
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obtained  by  using  the  interaction  term  in  the  regression  model  were 
small.    The  interaction  term  only  accounted  for  about  two  percent  of 
the  R2  values. 

Reliability  of  Person-Fit  Measures 

Corrected  odd-even  split-half  reliability  estimates  of  person-fit 
indices  were  low.    These  coefficients  ranged  from  .20  to  .56.  These 
internal  consistency  estimates  were  highest  for  the  person-fit  indices 
computed  for  the  seventh-grade  reading  test.    Part  of  the  reason  for 
the  higher  reliabilities  could  be  ascribed  to  the  speeded  nature  of 
the  reading  test  at  this  grade  level  since  the  original  sequential- 
test  item  number  was  used  to  split  the  test  into  odd-even  subtests. 
Magnusson  (1966)  cautions  on  using  split-half  methods  on  timed  tests. 
He  states,  "the  time  limit  has  the  effect  that  in  reliability 
computations  with  split-half  methods  the  test's  reliability  tends  to 
be  overestimated"  (p.  114). 

Frary  (1982)  analyzed  the  internal  consistency  of  person-fit 
indices  and  also  found  low  and  even  negative  split-half  coefficients. 
His  findings  led  him  to  conclude  that  unexpected  responses  to  a  small 
number  of  items  contributed  to  high  misfit  classifications  and  that 
there  was  little  consistency  on  the  specific  items  contributing  more 
to  misfit. 

These  findings  certainly  seem  to  call  into  question  the  notion  of 
person-fit  as  a  stable  trait  that  can  be  reliably  measured  and  also 
question  the  potential  utility  of  these  indices.    Frary  summarizes 
this  concern  when  he  suggests  "that  use  of  person-fit  measures  for  any 
decision-making  purpose,  especially  with  respect  to  individual 
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examinees,  should  be  undertaken  only  with  extreme  caution  and  that 
substantial  additional  research  will  be  required  before  they  can  be 
used  routinely"  (Frary,  1982,  p.  17). 

Summary 

Results  of  the  relationship  among  person-fit  statistics  were 
quite  high  within  same-subject  content  areas,  but  not  across 
different-subject  tests.    It  can  be  generalized  that  if  an  examinee  is 
identified  as  having  poor  person-fit  by  one  index,  another  index  will 
probably  identify  this  examinee  as  a  misfit  on  the  same  test,  but  it 
can  not  be  expected  that  this  examinee  will  also  be  a  misfit  on 
another  test. 

Significant  interactions  between  ability  and  test  anxiety 
demonstrate  that  ability  levels  moderate  the  relationship  between 
person-fit  measures  and  test  anxiety.    For  lower-ability  examinees  a 
direct  relationship  between  person-fit  measures  and  test  anxiety  was 
found.    For  higher-ability  examinees  this  relationship  was  inverse. 
The  cogniti ve-attentional  theory  of  test  anxiety  and  the  Spence-Tay lor 
drive  theory  suggest  interpretive  explanations  for  these  interaction 
results.   These  findings  are  of  theoretical  interest  to  those 
interested  in  learning  more  about  the  constructs  of  test  anxiety  and 
person-fit.    At  best  the  combination  of  ability,  test  anxiety,  and 
their  interaction  appears  to  account  for  only  about  one-fourth  of  the 
variance  in  person-fit  indices. 

Internal  consistency  (split-half)  reliabilities  were  relatively 
low  and  the  present  study  offers  little  evidence  to  support  the  notion 
that  higher  reliability  of  person-fit  indices  could  be  achieved. 
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According  to  these  results  the  potential  uses  of  person-fit  indices 
are  questionable  at  this  time.    More  research  is  needed  before  person- 
fit  indices  can  be  recommended  as  a  routine  measure  in  achievement 
tests . 
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